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Abstract. In one-bit compressed sensing, previous results state that sparse signals may be ro- 
bustly recovered when the measurements are taken using Gaussian random vectors. In contrast to 
standard compressed sensing, these results are not extendable to natural non-Gaussian distributions 

■ without further assumptions, as can be demonstrated by simple counter-examples. We show that 
(N. approximately sparse signals, which also satisfy a mild infinity-norm constraint, can be accurately 

bJQ[ reconstructed from single-bit measurements sampled according to a sub-gaussian distribution, and 

^ ■ the reconstruction comes as the solution to a convex program 

<: 
o 
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1. Introduction 

^ ■ In the standard noiseless compressed sensing model, one has access to linear measurements of 

the form 

yi = {a-ijx), i = l,2,...,m 
>. 

On. where ai, . . . ,am G are known measurement vectors and E is a sparse signal which 

■ 

. one wishes to reconstruct (see e.g. [2]). Let \\x\\q denote the number of nonzero entries in x. 

Typical results state that when the measurement vectors are chosen randomly from a sub-gaussian 



■ distribution, and ||a;||o < s, then m = 0{slog{n/s)) measurements are sufficient for robust recovery 

CN ■ of the signal x (see [2]). 

In one-bit compressed sensing, the measurements are compressed to single bits, and thus they 
take the form 

X: 

^. (1.1) yi = sign {{ai,x)) , t = 1,2, . . . ,m. 

Here, the sign function is defined by sign(t) = 1 when t > and —1 otherwise. Clearly, the 
magnitude of x is lost in these measurements and so the goal is to approximate the direction of x. 
Thus we may assume without loss of generality that x G 5'""^. 

One-bit compressed sensing was introduced in [1] to model extreme quantization in compressed 
sensing; the webpage http://dsp.rice.edu/lbitCS/ details its practical applications and the 
recent literature. We also note the similarity in model to sparse logistic regression; the connection 
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will be made clear in Section 1.1. We review the previous results here; we note that there are 
algorithmic results, theoretical results, and results which consider quantization with more than two 
bits (see the above webpage for more details). We restrict our review to the theoretical results that 
consider one-bit quantization. 

Suppose that the signal a; G R" satisfies ||a;||Q < s. Gupta et al. [3] assume that the measurement 
vectors are Gaussian and demonstrate that the support of x can tractably be recovered from 
either 1) O(slogn) nonadaptive measurements assuming a constant dynamic range of x (i.e. the 
magnitude of all nonzero entries of x is assumed to lie between two constants), or 2) O(slogn) 
adaptive measurements. Jacques et al. [4] introduce a certain binary e-stable embedding property 
which is a one-bit analogue to the restricted isometry property of standard compressed sensing. 
They demonstrate that Gaussian measurement ensembles satisfy this property with high probability 
(given enough measurements). Assuming the binary e-stable embedding property holds, they show 
that any estimate of x which is both s-sparse and approximately matches the data, will be accurate. 
In particular, O(slogn) Gaussian measurements are sufficient to have a relative error bounded by 
any fixed constant. These results are robust to noise. 

Plan and Vershynin [7, 8] show that one may reconstruct a sparse signal x from single-bit 
measurements by convex programming, for which tractable solvers exist. [7] considers the noiseless 
case and [8] considers the noisy case (and also sparse logistic regression). In [8] and the present 
paper, the model for the signal x is allowed to be quite general, with sparsity as a special case. 
Indeed, suppose x belongs to some known set which is meant to encode the model of the signal 
structure. For example, in order to encode sparsity, one could let K be the set 

Sn,s ■■= {a; G : ||£c||o < s, \\x\\^ < 1}. 

The recovery is achieved in [8] by solving the optimization problem 

m 

(1.2) max^^ yj(aj, a;') subject to x' £ K. 

1=1 

If is a convex set then (1.2) is a convex optimization problem, so it can be solved by a variety 
of convex optimization solvers. 

However, the reader may note that the set of sparse vectors Sn,s is extremely non-convex. To 
overcome this, it was proposed in [8] to take K to be an approximate convex relaxation of Sn,s (see 
[7, Lemma 3.1]), namely 

(1.3) K = Kn,s ■■= {a; G K" : WxW^ < 1, ||a;||^ < ^/s}. 

It was shown in [8] that m = 0{slog{n/s)) Gaussian measurements are sufficient to accurately 
recover x by solving the convex optimization problem (1.2). 

A natural question is whether reconstruction of x from one-bit measurements is still feasible when 
measurements are taken using random vectors with non- Gaussian coordinates. A simple counterex- 
ample shows that this is not generally possible even when the coordinates are sub-gaussian. Suppose 
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that all coordinates of are in {—1, 1}. For example, one may let the coordinates be independent 
symmetric Bernoulli random variables. Then the vectors 

a; = (1,^,0,. ..,0) and a;' = (1, 0, . . . , 0) 

clearly satisfy sign ((aj, a;)) = sign ((a^, a;')) = 1. This shows that one can not distinguish the two 
very different signals x and x' by such measurements/ even if infinitely many measurements are 
taken. 

One may ask whether this counterexample is typical or worst-case behavior. In this paper, we 
demonstrate that the latter is the case — a difficulty can only arise for extremely sparse signals. 
Namely, we show that under the assumption 

(1-4) ||a;||oo < ||a;||2 = 1, 

an approximate recovery of x is still possible with general sub-gaussian measurements, and it 
is achieved by the convex program (1.2). Furthermore, we prove that for the distributions that 
are near Gaussian (in total variation), an approximate recovery of x is possible even without the 
assumption (1.4). 

1.1. Main Results. We shall assume that the signal set K lies in the unit Euclidean ball in R", 
which we shall denote Blf- The quality of recovery of a signal x ^ K will depend on K through a 
single geometric parameter - the Gaussian mean width of K. It is defined as 

w{K) = 'L sup {g,x), 

where g denotes a standard Gaussian random vector in R", i.e. a vector with independent A'^(0, 1) 
random coordinates. The reader may refer to [8, Section 2] for a brief overview of the properties 
of mean width. 

The main purpose of this paper is to allow the measurement vectors to have general suh- 
gaussian (rather than Gaussian) independent coordinates. Recall that a random variable a is 
sub-gaussian if its distribution is dominated by a centered normal distribution. This property can 
be expressed in several equivalent ways, see [11, Section 5.2.3]. One convenient way to define a 
sub-gaussian random variable is to require that its moments be bounded by the corresponding 
moments of iV(0, 1), so that (E |a|P)^/P = O(^) as p — 7- oo. Formally, a is called sub-gaussian if 

(1.5) K:=supp-i/2(^|„|P)i/p 

p>i 

The quantity k is called the sub-gaussian norm of a. The class of sub-gaussian random variables 
includes in particular normal, Bernoulli and all bounded random variables. 



One can normalize the signals x and x' to lie on , and the same phenomenon clearly persists. 
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Our main result is a generalization of [8, Theorem 1.1], which states that when the measurement 
vectors ai, . . . , am are Gaussian, then 



with high probability. Our following result generalizes this theorem to when the measurement 
vectors ai, . . . , a^n have coordinates with sub-gaussian distribution. The only important difference 
is that the error now has an additive dependence on ||3?||qo. This serves to exclude extremely sparse 
signals, which can destroy recovery, according to the example we discussed above. 

Theorem 1.1 (Estimating a signal with no noise). Let a G R be a symmetric, sub-gaussian, and 
unit variance random variable with k as in (1.5). Let ai, . . . ,0^ be independent random vectors in 
IR" whose coordinates are i.i.d. copies of a. Consider signal set K C Bl^, and fix x £ K satisfying 
\\x\\2 = 1. Let y follow the 1-bit measurement model of Equation (1.1). Then for each /3 > 0, with 
probability at least 1 — 4e~^'^ , the solution x to the optimization problem (1.2) satisfies 



In this theorem and later, C and c denote positive absolute constants, which can be different 
from line to line. 

A proof of Theorem 1.1 is given in Section 3. 

This theorem can be easily specialized to sparse (and approximately sparse) signals. To this end, 
we consider K = Kn^s as in (1.3). A standard computation (see [8, Equation 3.3]) shows that 



Then the following corollary follows directly from Theorem 1.1. 

Corollary 1.2 (Estimating a sparse signal with no noise). Let K = Kn^s, s > 1, and let everything 
else be as in Theorem 1.1. Then with probability at least 1 — 4exp{— 2slog(2n/s)} > 1 — the 
solution X to the optimization problem (1.2) satisfies 



In words, this result yields that if the signal is approximately s-sparse, but not extremely sparse 
so that lla^lloo ^ ll^lb = Ij then with high probability x can be accurately recovered from 



general sub-gaussian measurements. 

We also establish a version of Theorem 1.1 under a statistical or noisy model. A noisy random 
measurement is modeled by a random variable yi taking values in { — 1, 1} such that 




w{Kn,s) < C^s\og{2n/s). 




m = 0{s log(n/s)) 



(1.6) 



E(yi|ai) = e{{ai,x)) 



i = 1, 2, . . . , m 
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where 9{-) is some function, which may even be unknown or unspecified. We only assume that 
9{t) G C^(IR), the first three derivatives being bounded by ti,T2,T3 respectively, and that 

(1.7) Ee{g)g=:X>0 

where g oc A^(0, 1). For example, in sparse logistic regression one would take 

e{t) = tanh(t/2), 

with bounds n = 0.5, T2 « 0.19, T3 « 0.083 and A w 0.41. 

To note another important example, observe that the setting of Theorem 1.1 is described by 
choosing 6{t) = sign(t) and disregarding the differentiability requirements. In this case, A = y^2/TT. 
It is useful to note that this is the largest possible value A can take, over all possible 9{t). 

The following is a version of Theorem 1.1 under this noisy or statistical model. 

Theorem 1.3 (Estimating a spread signal under random noise). We remain in the setting of 
Theorem 1.1, hut with random measurements yi modeled as in Equation (1.6). Then for each 
/3 > 0, with probability at least 1 — 4e~'^^, the solution x to the optimization problem (1.2) satisfies 

(1.8) \\x-x\\l<C {{t2 + r3)(n + l)\\x\W/' + j^iHK) + /?) 



For Gaussian measurement vectors a,j, a version of this theorem was proved in [8]. 
The proof of Theorem 1.3 is provided in Section 2. 

An interested reader may specialize this result to sparse signals x as we did before, i.e. by taking 
K = Kn s and noting as in Corollary 1.2 that w{Kn,s) < C ^y\og{2n / s) . 

Our last result is about non-Gaussian distributions, which nevertheless are close to Gaussian in 
total variation. For such measurements, it is reasonable to expect that the same conclusions as for 
Gaussian measurements, i.e. that the theorems above hold for all signals x without any dependence 
on ||a;||oo. We confirm that this is the case. Suppose that the coordinates of are i.i.d. copies of 
a random variable a that satisfies the total variation bound 

||a - 9\\tv ■= sup \P{a G A) - P{g £ A)\ <e 

A 



where g oc A^(0, 1). In the case when 9{t) = sign(t), one has 

II. - < e^'' + 



Jm 

and in the case when 6{t) G C'^ one has 

l|.-^ll^<.^/^ + "^'') 



m 



The precise results and their proofs are provided in the appendix as Theorems 4.1 and 4.4 respec- 
tively. 



2. Proof of Theorem 1.3 
It will be convenient to define the (rescaled) objective function for our convex program (1.2): 

^ m 

fx{x') := — y^Ui{ai,x'). 

i=l 

We reduce the proof to two main propositions. 
Proposition 2.1 (Expectation). Consider x G S^~^ , x' G i?2- If ^ satisfies 
(2-1) l|a:;||oo< ^ 



C(T2 + T3)Ea4' 

then 

(2.2) \EUx')-{Xx,x')\ <^{(r2 + T3)in + l)\Ea^\\x\\^)'^\ 

The proof of Proposition 2.1 is provided in Section 2.1. 
Proposition 2.2 (Concentration). For each t > 0, 



p( sup \Uz)-EUz)\>Ck '^^^}^'^ ] <4e- 

\zGK-K V"^ 

The proof of Proposition 2.2 is provided in Section 2.2. 
Proof of Theorem 1.3. First observe that if 

A 

oo ^ 



q2 



C(r2 + r3)Ea4 

then one may show that the right-hand side of Equation (1.8) is greater than 4, and the theorem 
trivially follows. (In verifying this calculation, note that K^/Ea^ > 1/16 and 1/A > Y^7r/2.) 

Hence, we may suppose otherwise, in which case Proposition 2.1 applies. Consider z' = x' — x ^ 
K — K. Further, using Proposition 2.1, we find 

- E/,(zO = Ef^{x) - E/,(a;') > {\x,x) - {Xx,x') - 2-^ {{t2 + t^){ti + 1) E a^||a;|U) 

> - x'Wl - ((r2 + T3)(ri + 1) Eaia;||oo)'^' • 
By Proposition 2.2, we have the following event and a lower bound on its probability, respectively: 

sup \f^{z)-Ef^{z)\<CK '^^^l^^ , l-Ae-P\ 



In this event, note that 

f.{z')<Ef,{z') + CK^^^^^^ 



m 



^ C 11 .1/2 A,, ^iio „ w(K) + /3 

<-T7^((r2+r3)(Ti + l)Ea^||a;||oo) ^ - -\\x - x\\l + C k ^ L_ ' . 
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This holds uniformly for all x' E K. Pick x' = x and recall that x maximizes fx] thus fxi^') = 
fx{x) — fx{x) > 0. Thus the right-hand side of the above inequality is bounded below by 0. By 
definition of k, Ea^* is bounded by 16k^. Rearranging completes the proof of the theorem. ■ 

2.1. Expectation: Proof of Proposition 2.1. For convenience, let us denote y := yi and 
a := a\. Recalling (1.6), we observe the following equivalences: 

(2.3) ^!x{x') = - VEy,(a„a;') = Ey(a,a;') = E(Ey(a, a;')|a) = E ^((a, cc))(a, a;')- 
m ^-^ 

i=l 

In order to analyze the above quantity, we will compare to the case when a is Gaussian, in which 
case the analysis is fairly simple, see [8, Section 4.1]. Such a comparison is a bi-variate version of 
Bery-Esseen central limit theorem for the function 9{{a,x)){a,x'). 

Lemma 2.3 (Berry-Esseen type central limit theorem). Consider a;,z G S*"^^. Let \x\, \z\ be the 
vectors obtained by taking absolute values of the coordinates ofx,z respectively. Then 



E9{{a,x)){a,z) - E0{{g,x)){g, z)\ < C {t2 + n) E a^xl + \z 




The proof is based on a Lindeberg replacement argument in two variables; it is provided in the 
appendix. 

A challenge arises when we wish to apply Lemma 2.3 to the expectation E9{{a, x)){a, x') in (2.3). 
Indeed, we have no way to control ||^c'||3, the quantity that is crucial in bounding the difference in 
Lemma 2.3.^ Recall that we have to treat all vectors x' G X C B2 that arise in the optimization 
problem (1.2). Some of these vectors may be very sparse, having ||a;'||3 ~ 1, which produces a 
useless bound in Lemma 2.3. 

Nevertheless, this obstacle can be bypassed. Observe that both sides of identity (2.3) are linear 
in x' . This motivates us to define the vector 

Vr^ := E0((a,£c)) • a 

and express 

Efxix') = {V„X'). 

The conclusion of Proposition 2.1 states that 

(2.4) ^ Xx, 

with the error bound given by the right hand side of (2.2). 

This vector approximation may be difficult to prove directly, i.e. based on Bery-Esseen type cen- 
tral limit theorem in n dimensions. However, (2.4) clearly follows from the two scalar approximate 
identities: 

(2.5) {vx,x) ^ X and Hi'x-lb ~ A. 

^Another problem is the required normalization x' £ 5*"^^, but it is just a minor nuisance which can be addressed 
by rescaling. 
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(Indeed, the first of these approximate identities states that is near a hyperplane with normal 
X, and the second one states that Vx is near the centered sphere tangent to that hyperplane.) We 
reduced the problem to proving (2.5). 

We shall deduce both inequalities in (2.5) from the Berry-Esseen Lemma 2.3 - the first inequality 
by choosing z = x, and the second by choosing z = Vx/\\vx\\2- The first inequality is simple. 

Lemma 2.4. 

\{Vx,x)-X\ <C{T2 + T3)Ea^\\x\\l 

Proof. By definition of x, we have {vx,x) = E9{{a,x)){a,x). On the other hand, by rotation 
invariance of Gaussian distribution and by condition (1.7), E9{{g,x)){g,x) = A. Thus, the lemma 
is a special case of Lemma 2.3 when z = x. M 

To apply Berry-Esseen Lemma 2.3 for z = Vx/\\vx\\2 in order to prove the second inequality in 

(2.5) , we need to know a two-sided bound on \\vx\\2 and an upper bound on 1 1 fa; 1 1 oo- ^Ve establish 
them in the following two lemmas. 

Lemma 2.5. A/2 < \\vx\\2 < 1- 

Proof. For the lower bound, using Lemma 2.4, we have 

(2.6) \\vx\\2 = ||i'x||2||a;||2 > \{vx,x)\ > X- C{T2 + Ts)Ea'^\\x\\l. 
Suppose the condition (2.1) of Proposition 2.1 holds with this C, then 

11 11 2 III|2|||| MM ^ 

W'^W'i — 11*^112 11*^1100 — ll^^lloo — 



2C(T2 + r3)Ea4- 

Plugging this into (2.6), we obtain \\vx\\2 > A/2, as desired. 
In the other direction, we have 

\\vx\\l = {vx,Vx) = Ee{{a,x)){a,Vx). 

Recall from the definition (1.6) that the function 9 is automatically bounded by 1, so 

\\vx\\l < EKa, Vx)\ < ma,Vxfy/^ = {\\vx\\l\Eal)^/^ = Wvxh- 

It follows that ll^x-lb ^ li as desired. ■ 

Lemma 2.6. ||t'x||oo < ''"i||a^||oo- 

Proof. We have H'Uxlloo = niaxjg[„] | E6{{a,x))ai\. Let us express in coordinates 

n 

(a, a;) = ^ OkXk = S + OiXi 

k=l 



where we denote S = X^^^j OfcXfc. By assumption, the distribution of is symmetric. Therefore 
is identically distributed with |oj|e, where e denotes an independent symmetric Bernoulli random 
variable. Conditioning on aj, we obtain 



E 6{{a, x))ai = E 0(5' + |aj|exj) |aj|e = - E [0(5* + |ai|a;j) — 6{S — |aj|a;j)] \ai\ 

By the Mean Value Theorem, \\e{S + t) - e{S -t)\ < \\6'\\^t < Tit for t > 0. Thus 

I E9{{a,x))ai\ < Ti\xi \ E |aj|^ = Ti|xi|. 

The conclusion of the lemma easily follows. ■ 

We can now turn to the second approximate identity in (2.5). We will only need an upper bound, 
and we formally state it in the following lemma. 



Lemma 2.7. 



(J 

\vx\\2 < A + — (t2 + T3){ti + A) Ea^||a;| 



Proof. We express 

Wv^h = {vx,Vx/\\vx\\2) = ^0{{a,x)){a,z), where z := Vx/Wv^h- 

We would like to apply the Berry-Esseen type result. Lemma 2.3. The corresponding quantity for 
the normal distribution can be easily computed using rotation invariance, see [8, Lemma 4.1]: 

Eei{g,x)){g,z) = X{x,z). 

Lemma 2.3 then yields 

(2.7) ||^;^||2 < X{x,z) +C{T2 + T3)\Ea^\x\ + \z\\\l 

Next, Lemma 2.5 and Lemma 2.6 together imply that ||2;||oo < 2ti ||a;||oo/A. Hence 

1 3 1 2 

^lll^l ~^ I'^llls — ^lll'^^l ~^ I'^lll2 * 111*^1 ~^ l-^llloo 

(2.8) ^ lll"'^! ~l~ |-^ll|oo — ll-'^lloo ~l~ ||-^||oo — ll^'^lloo' 

Combining (2.7) and (2.8), we complete the proof. ■ 

We are ready to deduce Proposition 2.1 from Lemmas 2.4 and 2.7. 
Proof of Proposition 2.1. 

\'^fx{x') - {\X,X')\ = \{Vx,x') - {\X,X')\ < \\Vx - \X\\2. 

Next, expanding and rearranging the terms, we have 

\\vx - Xx\\l = \\vx\\2 + A^||a;||2 - 2X{vx,x) = {\\vx\\2 + X){\\vx\\2 - A) + 2A(A - {vx,x)). 
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Now we bound all these terms. As we mentioned in the introduction, A < ^J2/t^, and by Lemma 2.5, 
ll'WxIb ^ 1- This controls the factor ||i'x||2 + A. Lemma 2.7 gives a bound on the factor H^xlb — A. 
Finally, Lemma 2.4 gives a bound on |A — {vx,x)\. Putting all these together, we conclude that 



>^x\\l < C[ -(r2 + T3)(ri + A)Ea^||a;||oo + (r2 + T3)Ea^||a; 



The last line follows from Lemmas 2.4 and 2.7. Since ||a;||o < ||a;| 



\x\ 



I a; I loo, this completes 



the proof. ■ 

2.2. Concentration: Proof of Proposition 2.2. We need to control the random variable 

Z:= sup \Uz)-EUz)\. 
zeK-K 

This will be done using techniques from probability in Banach spaces, following the argument in 
[8, Proposition 4.2]. The symmetrization lemma below allows us to essentially replace Z by the 
random variable 

1 



Z' := sup — 

z€K-K m 



^eiyi{ai,z) 



i=l 



where £i denote independent symmetric Bernoulli random variables. 
Lemma 2.8 (Symmetrization). We have 

(2.9) EZ<2EZ'. 
Furthermore, for each t > we have the deviation inequality 

(2.10) P{Z>2EZ + t)<AP{Z' >t/2). 

The proof of this result is identical to the proof of [8, Lemma 5.1] for the normal distribution. 
The following is a standard Gaussian concentration inequality, which is a simple extension of [5, 
Theorem 7.1]. 

Lemma 2.9 (Gaussian concentration). Given a set K C B2, we have 

\ r > 0. 



P sup (sr, z) - w{K) > r < e 

The following inequality is an adaptation of [6, Lemma 4.6]: 

Lemma 2.10 (Contraction Principle). Consider sequences of independent symmetric random vari- 
ables rji and such that for some scalar M > 1, and every i and t > 0, 

P{\m\ >t)< MP{U > t). 

Then for any finite sequence Xi and an integer p > 1, we have 

P / n \ P 



E 



< E M 
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i = l 



Xj 



We will first apply Lemma 2.10 to derive a moment bound on Z', and then convert the moment 
bound back into a tail bound to apply in the right-hand side of Equation (2.10). 
Because EiyiCii has the same distribution as aj, and by the symmetry oi K — K, 

( 1 \^ i . m n \^ 

We apply Lemma 2.10 with {ai)j in place of r/j, e^e^ in place of Xi (where ej is the i-th standard 
basis vector), as independent A'^(0, 1) random variables, and the matrix semi-norm defined by 
ll^ll := sup^^f^_f^ J To this end, recall that {ai)j are distributed identically with a. Since 

a is a sub-gaussian random variable, it follows from definition (1.5) that 

P{\a\> t) <CP{\g\- K> t), t>0. 

Therefore an application of Lemma 2.10 allows us to replace {ai)j by {CK){gi)j and thus conclude 
that 

( m n \^ . „ ^p 

-Y.Y.^<9^)JzA sup {g,z)] . 

zeK-Kmf^^j^^ J \VrnzeK-K J 

To further develop this inequality, we express the Gaussian concentration tail bound (Lemma 
2.9) in terms of moment bounds. For convenience, define 

i= sup {g,z). 

z€K-K 

Using Lemma 2.9 and the equivalence of sub-gaussian properties, for instance in [11, Lemma 5.5], 
we have 

iEiC-wiK))l)yP<C^. 
Above (^ — w{K))+ := max(^ — w{K), 0). Applying Minkowski's inequality gives 
(E^P)i/p < - w{K))iy/P + {\Ew{Kyy/P < C^ + w{K). 
Combine this with Equation (2.11) to give the moment bound 

(2.12) (E(Z')n^/^<C-^ + "^'^^^ 



m 



In order to convert this into a tail bound, fix /3 > and let p G [/3^,/3^ + 1) be rounded to the 
next highest integer. Further, let t = e • (E(Z')p)^/p. Then, by Markov's inequality we have 

(2,13) P(Z' > ,) < < e-^" where , < c , 



To complete the proof of the proposition, apply Lemma 2.8: The moment bound (2.12) with 
p = 1 controls E(Z') and the tail bound (2.13) controls the right-hand side of Equation (2.10). 
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3. Proof of Theorem 1.1 



First observe that the proof of Proposition 2.2 is independent of 9(t) and in particular it holds 
for the sign function. The theorem then follows easily from an analogue of Proposition 2.1: 

Proposition 3.1 (Expectation). Consider x,x' G S'"'^^ . If x satisfies ||a;||oo < A/(CE|ap), then 

\EUx') - {Xx,x')\ < ^E\af\\x\\'J,\ 

To prove the proposition, we prove analogues of the lemmas used to prove the original proposition 
(Proposition 2.1). 

Lemma 3.2. \{vx,x) - A| < C E |ap||a;|||. 

Proof. Recall that by definition of x, 

{vx,x) = Es\gn{{a,x)){a,x) = E|(a,a;)| . 

Note that A = -s/^/tt = ^\g\ and thus, to prove the lemma, we wish to bound the difference 
I E |(a, a;)| — E 1. We have 



E|(a,aj)| -E|<7| 



/ P{\{a,x)\>t) - P{\g\>t)dt =2 / P{{a,x) > t) - P{g > t)dt 
Jo Jo 



To conclude, we apply a Berry-Esseen result, for instance as in [9, Theorem 2.1.24], which bounds 
the above quantity by 

n 

C^E\xiaif = CE\a\^\\x\\l. 
1=1 



Observe that as a result of Lemma 3.2, Lemma 2.5 is proven as before but requiring only 

||ic||oo < II^Elli < A/(2CE|a|^). 
An analogue for Lemma 2.6, however, requires a different approach. 
Lemma 3.3. ||i>a,-||oo < C* E |a|'^||a;||oo- 

Proof. Establishing the notation (a, x) = Yl'k=i ^kXk where without loss of generality, Xi > 0, define 
for convenience S = 'Y^^i ^k^k and let Fs be the cumulative distribution function of S. Consider 
an arbitrary constant r. 



\LQ{S ^ rxi) ■ r\ 



sign(f + rxi)dFs{t) 



[ dFsit) - [ dFsit) 

Jt>~rxi J t<—rxi 



= \r\ \P{S > -rxi) - P{S < -rxi)\ = \r\P{\S\ < \r\xi) 
< \r\ P{\g\ < \r\ Xi) + |r| • \P{\g\ < \r\ Xi) - P{\S\ < \r\xi)\ . 
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The second term in the last inequahty may be bounded using the Berry-Esseen Theorem, which 
may be found for instance in [9, Theorem 2.1.30]. This gives 



\Ee{S + rx^)-r\ < \r\U l\r\xi + 2 1^,4] ^ |a|=^ ^ . 

Note \\x\\l < II ^c||Qo||ic||2 — ll^^lloo — 1/8? where the last inequality is by assumption. Then 
< 1/8, xf < 1/4, so that Ylk^i4 — '^/^- Observing furthermore that ||a3||oo > Sfc^i ^?ll^lloo — 
Sfc^-^j |xjp, we have the bound 

\E6{S + rxi) -rl < Cr^Xi + Cr E\af\\x\\oo. 
We may express a single coordinate of Vx = E9{{a, x)) ■ a as E6{{a, x)) ■ ai. Then, 
\Ee{{a,x))-a^\< I \Ee{S + tXi)-t\dFa,{t) 

< [ {Ct^Xi + C\t\\E\af\\x\\^)dFaM 
Jr 

= CxiEaf + CE |ap||a;||oo E loil 

< Cxi + CE |a|^||a;||oo. 

Observing that E|op > Ea^ = 1 completes the proof of the lemma. ■ 

Defining z = i7a;/||?;^.||2, applying Lemma 2.5 and Lemma 3.3 yields ll^illg < H^Hoo ^ C'-£'|fflp||3^||oo/A. 
Hence, 

ll^'xlb = {vx,Vx/\\vx\\2) = Esign((a,a;))(a,z) 

< Esign((a,z))(a,2;) = {v^,z) 

< A + C7E|a|2||z||i < A + ^(E|a|3)2||a;||oo. 

A 

Combining results, we have 

\{vx,x') - {Xx,x')\'^ = \\vxf - A2 + 2A(A - {vx,x)) 

= i\\vx\\ + X){\\vx\\ - A) + 2A(A - {vx,x)) 

< C Q(E|a|3)2||a;||oo + E|a|3||a:||3 

Recalling that ||ic||oo > IIs^IIsj we may collect terms to conclude 

|E/.(^')-(A^,^')l<3;^E|a|3||^||V2. 

This completes the proof of Proposition 3.1. 

Theorem 1.1 follows as in the proof of Theorem 1.3 in Section 2 and by noting A = y^2/TT. 
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4. Conclusion 



In contrast to standard compressed sensing, one-bit compressed sensing is infeasible when the 
measurement vectors are Bernoulh and the signal is extremely sparse. Nevertheless, we show that 
when the signal is sparse, but not overly sparse, it may be recovered from Bernoulli (or more 
generally, sub-gaussian) one-bit measurements. To our knowledge, these are the first theoretical 
results in one-bit compressed sensing that specifically allow non-Gaussian measurements. 
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Appendix 



4.1. Proof of Lemma 2.3. We apply Lindeberg replacement argument in a way similar to [10, 
Proposition D.2]. Define Vj = {xj, Zj), and let g G be a vector of independent standard Gaussian 
variables. Define Si = + Sj=i+i5'j^j ^^"^ '^(^) — G{x)z (where v = {x,z)). Then note 

by telescoping, 



E9((a,a!»(a,z> -E9({g,a^»(g,i>| 



N 



< ^ I E0(S'j + QiVi) - \E(j){Si + aiVi)\. 



i=l 



By Taylor's theorem with remainder, we have 

\a\=l |o|=2 |a|=3 

for some S[ on the line segment joining Si and Si + UiVi. A similar result holds for </>(S'j + giVi) with 
respective S'/ . Observe that since Ea = E^f = and Ea^ = Eg'^ = 1, the zeroth to second order 
terms cancel upon taking expectations in the difference 



E <j){Si + aiVi) - E (j){Si + QiVi) 



1 



E ^ {amrd^<t>{S[) - E ^ {gmrd^4>{S'l) . 

|a|=3 |a|=3 

Consider the first expectation on the righthand side. Observe that the partials in the error vanish 
except when at most one partial is taken on the second argument of yielding either 0"{x) or 
9"'{x)z. Furthermore, note that since S'^ is on the line segment joining Si and Si + UiVi, we may 
apply the bound |(5'-)2| < |(5'i)2| + |ajZj| to conclude 



E 



l«l=3 



< E 5^ |a: 

|a|=3 
v3/ _ r-l |3 



l<l(ll^"l|oo + ||^'"| 



^{\{Si)2\ + \aiZi\)) 



{\xi\ + |zi|)'(r2 E|a,|'^ + T3(E | (5i)2af | + \z,\ Eaf)) 

Observe that {Si)2 and Oj are independent, and (E|(S'j)2|)^ < E(S'i)2 ^ 1 by Cauchy-Schwarz 
and that the variance of an independent sum is a sum of variances. Further observing that \zi\ < 1 
and E|a|3 < E a^, we may collect terms to conclude 



E 



^ {amra^Hsi] 

\a\=3 



< i\xi\ + \zi\f{T2 + 2T3)Eat 



A similar bound follows for the remainder from the Gaussian expansion, so that summing over 
i from 1 to n, and observing that the Gaussian remainder can be absorbed since Ea^ > Ea^ = 1, 



£e{{a,x)){a,z)-\Le{{g,x)){g,z) 
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< C(T2 + T3)Ea^|||a;| + |z|||^, 



which completes the proof of the lemma. 



4.2. Total variation: sign function. We consider the setting of Theorem 1.1, where 6{t) = 
sign(t), with the additional assumption that \\a — gWrv < 

Theorem 4.1 (Estimating a signal with no noise). We remain in the setting of Theorem 1.1 with 
the additional condition \\a — qWtv ^ £• Then for each /3 > 0, with probability at least 1 — 4e~^'^ , 
the solution x to the optimization problem (1.2) satisfies 



\x - x\\l < C^/^e^'^ + 



'MK) + f3). 



As with Theorem 1.1, the main result follows easily from an analogue of Proposition 2.1. Below, 



A = 72/^. 

Proposition 4.2 (Expectation). Forx,x' G 

\EfUx') - {Xx,x')\ < C{\Ea^)^/'e'/\ 
To prove the proposition, we prove an analogue of a lemma used to prove the Proposition 2.1. 



Lemma 4.3. \{vx,x) — X\= \E\{a,x)\ 



Proof. We first prove a variant of the Berry-Esseen result on expectations, applying Lindeberg 
replacement. Define 5^ = X^}=i cijXj + Yl]^=i+i 9j^j^ ^^'^ '/'(^) be a twice differentiable function. 
We will later use an approximation argument to replace (p by the absolute value function. 
Note by telescoping, 

/ N \ 



\E(t>{{a,x))-E(t>{{9,x))\ 



E 



QiiXi 



E( 



^ N 

^gi^i 
\i=i 



N 



< ^ I IE (j){Si + OiXi) - E (f){Si + giXi) 



1=1 

For convenience, dropping subscripts, we now wish to bound | E</)(S' + ax) — E(/)(S' + gx)\. 
By Taylor's theorem with remainder, we have 

(t){S + ax) = 0(5) + ax(j)'{S) + R{S, ax) 

where \R{S,ax)\ < (ax)^||</)"||oo/2. A similar result holds for (j){S + gx). 

Split R{S,x) into R+{S,x) > and R-{S,x) > 0. Observe that since Ea = Eg = 0, the zeroth 
and first order terms cancel upon taking expectations in the difference 

I E0(S + ax) - E(l){S + gx)\ = \ ER{S,ax) - ER{S,gx))\ 
< I ER+{S,ax) - ER+{S,gx)\ + \ ER^{S,ax) - ER^{S,gx)\. 
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Consider the difference with We will apply the assumption ||o — qWtv ^ ^- First, observe 
that S is independent of both a and g and may be viewed as a constant. Viewing for instance 
R+{S,ax) as a function of a, 



M 



M 



P{R+{S,ax) > t)dt - 
Then, consider the tail of the first integral: 

P{R+{S,ax) > t)dt < 



P{R+{S,gx) > t)dt 
E(i?+(5, axf) 



M 



M 



< Me. 



dt 



\ER+{S,axf ^x^Ea 



lloo 



M - 4M 

The Gaussian tail yields a similar error. Hence, optimizing over M by choosing 



M 



we have overall error 



\\ER+{S,ax) -ER+{S,gx))\ <x\\Ea^ + \Eg^)^/^U"\\ooV^. 
The same holds for the difference with Finally, summing over the n indices, and using that 

\\Ecl){{a,x)) - Ecb{{g,x))\ <2{Ea^ + Eg^)^/^U"\\^V^. 

Second, we approximate the absolute value using 0(a;) := Vc + ~ \x\. Observe for instance 
that |E|(a,x)| — E<j){{a,x))\ < y/c, and likewise with g in the place of a. Evaluating (p"{x) = 
c/{c + x^)^/^ with a maximum of l/\/c at x = 0, we may conclude 



{v,j,,x) - A| 



E\{a,x)\-E\{g,x)\ 



<2V^ + 2(Ea^ + E/)i/2yi. 



Choosing y/c = (Ea^ + Eg'^Y^^e^^^ completes the proof of the lemma. ■ 

We now proceed to bound ||'Ua;|l2> thus obtaining the second geometric constraint required in the 
proof of the proposition. We apply lemma 4.3 with I'x/H'i'xlb in the place of x: 

\\vxh = {vx^Vx/Wvxh) = Esign((a,£c))(a,2;) 
< Esign((a, z))(a,z) = {v,,z) < \ + A{Ea^ + Eg^fl^e^l\ 
An additional direct application of the lemma yields 

\{v^,x') - {\x,x')\^ = \\v^\\l - + 2A(A - {v^,x)) 

= {\\vA2 + ^){\M2 - A) + 2A(A - {v^,x)) 

< 16(Ea^ + E/)i/V/^ 

Proposition 4.2 is a consequence of absorbing constants, and Theorem 4.1 follows as in the proof 
of Theorem 1.3 in Section 2. 
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4.3. Total variation: smooth noise model. We consider the setting of Theorem 1.3, with the 
additional assumption that \\a — qWtv ^ £• We also relax the assumption on 0{t), defined as in 
(1.6), to e{t) G C^. 

Theorem 4.4 (Estimating a signal with noise). We remain in the setting of Theorem 1.3 with the 
additional condition ||a — g\\TV ^ ^.f^d also relax the condition on 9{t) to 0{t) £ C^. Then for 
each (3 > 0, with probability at least 1 — 4e~^^, the solution x to the optimization problem (1.2) 
satisfies 

\\x-^\l<c({ti' + l){Ti+T2)^e + ^^{w{K)+p)^ . 

As with Theorem 1.1, the main result follows easily from an analogue of Proposition 2.1: 

Proposition 4.5 (Expectation). For x,x' G 5"^^, 

\Ef^{x') - {\x,x')\ < 8(Ea6 + E/)i/2(ri +r2)^/^. 

Because we intend to have no dependence on x, the required generality of x' is no additional 
burden. As a result, it is possible to prove the proposition directly. 

Proof. Recalling [2, Lemma 4.1], observe that the left hand side of the inequality is expressible as 

\Ef^{x') - {\x,x')\ = I Ee{{a,x)){a,x') - \Ee{{g,x)){g,x')\. 

The statement of the proposition becomes similar to that of Lemma 2.3. Using the same notation 
and proceeding as in its proof (hence for instance using z in place of vx'), we apply Lindeberg 
replacement: 

N 

I Eei{a,x)){a,z) - Eei{g,x)){g, z)\ < ^ \E(^iSi + aiVi) - Ec^iSi + a,Vi)\. 

i=l 

As before, we Taylor expand, except only to second order error: 

cPiSi + aiVi) = cPiSi) + {amTd'^cPiSi) + RiSi, am) 

\a\=l 

where R{Si, aiVi) = ^ X]|o|=2(^i^«)"^"'/'('5'D for some S[ on the line segment joining Si and Si + aiVi. 
A similar result holds with (l){Si + QiVi), with respective . 

Split R{S,v) into R+{S,v) > and R^{S,v) > 0. Observe that since Ea = Eg = 0, the zeroth 
and first order terms cancel upon taking expectations in the difference 

I E<l){Si + aiVi) - E(j){Si + giVi)\ = \ ER{Si,aiVi) - E R{Si, giVi))\ 

< I ER+{Si,aiVi) - ER+{Si,giVi)\ + | ER_{Si,aiVi) - ER^{Si,giVi)\. 

Consider the difference containing We will apply the assumption ||a — g\\TV ^ £• First, 
observe that Si is independent of both Oj and gi and may be viewed as a constant (by conditioning 
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on it). Viewing for instance R-^.{Si,aiVi) as a function of aj 



/ P{R+{Si,aiVi)>t)dt- P{R+{Si,giVi) > 
Jo Jo 



t)dt 



< Me. 



Then, consider the tail of the first integral: 

/ p{R+{Si,aiVi)>t)dt< ^ +^ y ' dt = — +^ ;; . 

Jm Jai ^ 



Recall the explicit form of the remainder and observe that the partials in the error vanish except 
when at most one partial is taken on the second argument of yielding either 6'{x) or 9"{x)z. 
Furthermore, note that since 5"^ is on the line segment joining Si and Si + aiVi, we may apply the 
bound |(5'j')2| < |(5'.j)2| + \0'iZi\ to conclude 

EAR+[Si,aiVif < EAR{S,,a,v,f ^ ^ { Yl "^I^KH^'IU + ll^"l|oo(|(5i)2| + \aiZ,\)) 

Ja|=2 

= E (a2(|xi| + \zi\)\n + r2i\iSi)2\ + M)))^ 

Observe that {Si)2 and are independent, and (E|(S'i)2|)2 < E(5i)2 ^ 1 by Cauchy-Schwarz 
and that the variance of an independent sum is a sum of variances. Further observing that \zi\ < 1 
and for instance E |a|^ < Ea^, rearranging and collecting terms yields 

\E4R+{Si,aiV,)^ < + \z^\)^ (4t| Eaf + Eaf + 4rir2 E \a,f) 

<A{\xi\ + \zi\)\n+T2f\Eal 
The Gaussian tail yields a similar error. Hence, optimizing over M by choosing 

M = + |^.|)2(Ea6 + E/)i/2(ri + ra) 

we have overall error 

I \ER+{Si,a,Vi) - \ER+{Si,giVi)\ < 2{\x,\ + {zilfiEa^^ + Eg'^y/^n + T2)V^. 

The same holds for the difference with Finally, summing over the n indices, and using that 
||a;||2 = 1 and ||^||2 = 1, 

\E^{{a,x)) - \E(t>{{g,x))\ < 8{\Ea^ + Eg^/^n + T2)V^, 
which concludes the proof of the proposition. ■ 

We can further simplify the error expression in Proposition 4.5 by observing that 

(Ea6 + E/)i/2 <c{k^ + 1). 

Then Theorem 4.4 follows as in the proof of Theorem 1.3 in Section 2. 
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